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Abstract 

Background: In medicine, researcli misconduct is liistorically associated witli laboratory or pliarmaceutical researcli, 
but tlie vulnerability of epidemiological surveys should be recognized. As these surveys underpin health policy and 
allocation of limited resources, misreporting can have far-reaching implications. We report how fraud in a nationwide 
headache survey occurred and how it was discovered and rectified before it could cause harm. 

Methods: The context was a door-to-door survey to estimate the prevalence and burden of headache disorders in 
Pakistan. Data were collected from all four provinces of Pakistan by non-medical interviewers and collated centrally. 
Measures to ensure data integrity were preventative, detective and corrective. We carefully selected and trained the 
interviewers, set rules of conduct and gave specific warnings regarding the consequences of falsification. We employed 
two-fold fraud detection methods: comparative data analysis, and face-to-face re-contact with randomly selected 
participants. When fabrication was detected, data shown to be unreliable were replaced by repeating the survey in 
new samples according to the original protocol. 

Results: Comparative analysis of datasets from the regions revealed unfeasible prevalences and gender ratios in one 
(Multan). Data fabrication was suspected. During a surprise-visit to Multan, of a random sample of addresses selected 
for verification, all but one had been falsely reported. The data (from 840 cases) were discarded, and the survey 
repeated with new interviewers. The new sample of 800 cases was demographically and diagnostically consistent 
with other regions. 

Conclusion: Fraud in community-based surveys is seldom reported, but no less likely to occur than in other fields of 
medical research. Measures should be put in place to prevent, detect and, where necessary, correct it. In this instance, 
had the data from Multan been pooled with those from other regions before analysis, a damaging fraud might have 
escaped notice. 

Keywords: Fraud; Research misconduct; Epidemiology; Headache; Pakistan; Global Campaign against Headache 



Background 

Research misconduct includes fabrication, falsification or 
plagiarism in proposing, performing or reviewing research, 
or in reporting research results [1]. It appears to be com- 
mon: FaneUi's 2009 systematic review and meta-analysis of 
survey data found almost 2% of scientific researchers ad- 
mitted having fabricated, falsified or modified data or re- 
sults at least once [2]. In medicine, research misconduct is 
historically associated with laboratory or pharmaceutical 
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research but has been uncovered in a range of clinical and 
genetic studies {e.g, [3-11]). In such circumstances the vul- 
nerability to misconduct of epidemiological or population- 
based surveys should be recognized. As such surveys are 
performed to assess the burden of a disease, to underpin 
needs assessment and inform health policy involving the 
allocation of usually limited resources, research miscon- 
duct and failure to detect it can have major and far- 
reaching implications. 

With the availability of electronic data loggers, portable 
touch-screen computers, on-line maps and GPS trackers, 
data collection in many environments has become paper- 
free and much easier. These uses of technology have 
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facilitated quality control over data collection, leaving 
fewer ways to cheat without being discovered. However, in 
developing countries where access to technology is limited 
and data collection is still mainly paper-based, multiple 
safeguards may need to be employed to maintain quality 
assurance and prevent misconduct and its consequences. 

We report here how fraud in a nationwide epidemio- 
logical headache survey occurred and how it was discov- 
ered and rectified before it could cause harm. The context 
was a door-to-door survey to estimate the prevalence and 
burden of primary headache disorders in Pakistan. The 
protocol for the survey, designed according to standard 
principles [12], required data collection by hired non- 
medical interviewers from participants in six major cities 
across the four provinces of Paldstan, and from rural areas 
neighbouring each city. The expected procedure was to 
call at randomly-selected households unannounced, list 
the adult household members in each, select one ran- 
domly and interview that person (returning by appoint- 
ment to do so if he or she was not present at the initial 
visit). The interview followed a structured questionnaire, in- 
cluding demographic enquiry, screening and diagnostic 
headache questions, and further enquiry into headache- 
attributed burden when appropriate. Full details of the sur- 
vey methodology have been published previously [13]. The 
survey was eventually completed by 4,223 respondents. 

Methods 

Measures set out within the study protocol and under- 
taken to ensure data integrity were preventative, detect- 
ive and corrective. 

Prevention 

We carefully selected and trained the interviewers, set 
rules of conduct for them, gave specific warnings regard- 
ing the consequences of suspected and proven falsifica- 
tion, provided adequate and equitable compensation, set 
up effective lines of communication, undertook in-field 
supervision during data collection, and demanded regu- 
lar reporting. 

At the outset of the study, we engaged an interviewer re- 
cruitment agency with experience in health-care related 
field surveys all over the country. We explained the pur- 
pose and design of the study. We advertised for and se- 
lected interviewers who had a track-record of reliability, 
could speak the local (provincial) language and could read 
and write in Urdu fluently, and hired them on monthly 
salaries. There were two interviewers in each of the six 
survey locations, except Lahore with four to accommodate 
its larger size. We called all fourteen to the main centre 
(Karachi) for a two-day workshop and trained them 
according to a set training protocol which included a) face- 
to-face meetings with all co-investigators and introduc- 
tions to the supervising co-investigators for each location. 



b) the purpose and goals of the study, c) its importance 
and likely impact, d) an overview of headache disorders, e) 
administration of the structured questionnaire, f) mock 
interview sessions, g) a question and answer session and 
h) discussion and resolution of any queries. Afterwards 
they returned to their respective cities and the question- 
naires, weighing machines, measuring tapes and stationery 
bags were mailed to them. AU expenses were reimbursed. 

One of each pair or foursome of interviewers was 
appointed location supervisor. 

During data collection, we monitored the interviewer- 
groups by regular telephone calls and location supervisors 
provided regular updates on progress. One co-investigator 
was responsible for each location. We made occasional an- 
nounced field visits in the more accessible locations, and 
used these to resolve any emerging problems, passing the 
experience to all other locations. Special requests to over- 
come cultural sensitivities (such as hiring local female 
health workers) were met. 

The data were couriered to the principal centre in 
Karachi at regular intervals. 

Detection 

We employed two-fold (belt-and braces) fraud detection 
methods at all locations: comparative data analysis, and 
face-to-face re-contact with randomly selected participants. 

Throughout the data-collection period, completed 
questionnaires received in Karachi were numbered and 
inspected for obvious irregularities. The data were en- 
tered onto computer by the data-entry team. Compara- 
tive analyses were made between each location and the 
others for unexpected differences. 

Re-contact consisted of one surprise-visit by the co- 
investigators to each location in the latter half of the data 
collection period. Interviewers were given short notice (no 
more than a few hours) of our arrival. We randomly se- 
lected 10-30 questionnaires at each location, met the in- 
terviewers and accompanied them to the respective 
households. At each, the interviewer waited outside, out of 
sight, while a co-investigator sought entry to the house, 
asked about the recent survey visit and requested a de- 
scription of the interviewer. If the original participant was 
available, the interview was repeated. Second question- 
naires were later compared manually with those filled by 
the interviewers. 

We focused our attention on any location where suspi- 
cions had arisen during preventative measures or data 
comparison. 

Correction 

Full corrective measures required that data shown to be 
unreliable were excluded from the survey analysis and 
replaced by repeating the survey in new samples accord- 
ing to the original protocol. 
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Results 

In the later stages of data collection, an interviewer at 
one location (Multan) reported involvement in a car ac- 
cident, and requested more time because only one inter- 
viewer was working. This centre began falling behind its 
daily target. Since two other centres were also slightly 
behind target, we extended the period of data collection 
by two months. By the end of this extension, the Multan 
interviewers still had not returned their rural sample of 
questionnaires or those from one urban cluster-sample. 

During the surprise-visit to Multan, the interviewers 
brought all the outstanding questionnaires but were not 
cooperative with the data authentication procedures. They 
declared themselves unavailable for the task in the near fu- 
ture, citing unspecified "personal reasons". Of the random 
sample of addresses selected for verification, only one 
could be found; later it transpired that the others were 
falsely reported. 

These circumstances inevitably created strong doubts 
over the authenticity of the data. Comparative data ana- 
lysis revealed significant discrepancies in the Multan data: 
the demographics of the sample were noticeably dissimilar 
to those reported by the Pakistan Federal Bureau of Statis- 
tics (FBS) from the last census of Pakistan in 1999, which 
was extrapolated to 2006 [14] (Table 1), and the preva- 
lences and gender-distributions of headache disorders 
did not match expected statistics or those from other 



locations. We came to the realization that the inter- 
viewers had not visited the rural areas but, instead, 
fraudulently filled in the questionnaires with invented 
data. 

We deemed the data from the entire region unusable. 
We repeated data collection in Multan with different in- 
terviewers employed under legal contracts that made them 
liable in the event of fraud or dishonesty. They were paid 
on delivery and successful verification of questionnaires, 
rather than on a monthly basis, removing the incentive of 
monetary gain by deliberately prolonging the data collec- 
tion phase. 

The two-day field visit for authentication of data was 
made after delivery of 300 of the required 800 question- 
naires. We randomly selected 10% (80) from different 
clusters in Multan City and its adjoining rural areas. We 
disclosed the addresses of the selected households to the in- 
terviewers on the day of our visit. Interviewers were obliged 
by their contracts to accompany the co-investigator to these 
households. All 80 households were located, and their par- 
ticipants verified; all recognized their interviewers. 

This re-survey in Multan was completed in 3 months. 
The fabricated data were withdrawn from the database and 
replaced with the new data. Table 1 compares the two data- 
sets. The demographic data show a reversed male:female 
ratio and an unfeasible bimodal age distribution in the 
fraudulent dataset, with a migraine prevalence of 51.4%. 



Table 1 Comparisons between fraudulent and new datasets in Multan, and national demographic statistics 







Fraudulent data (%) n = 842 


New data (%) n = 800 


FBS data (%) 


Gender 


Male 


70.1 


434 


49.7 




Female 


29.9 


56.6 


50.3 


Age (yr) 


18-29 


16.9 


29.5 


36.4 




30-39 


6.5 


323 


25.4 




40-49 


64.8 


20.6 


20.5 




50-59 


10.8 


123 


13.1 




60-65 


0.7 


54 


4.6 


Marital status 


Married 


90.5 


82.5 


n/a 




Unmarried 


8.8 


14.9 


n/a 




Divorced 


0.1 


1.9 


n/a 


Headache % (n) 


No headache 


27.9 (235) 


133 (106) 






Migraine 


514 (433) 


254 (203) 






male 


70.7 (306) 


32.0 (65) 






female 


293 (127) 


68.0 (138) 






TTH 


20.0 (168) 


46.3 (370) 






Headache on >15 days/month 


0.1 (1) 


1 24 (99) 






MOH 


0.1 (1) 


2.0 (16) 






Undetermined 


0.5 (4) 


0.8 (6) 





FBS: Federal Bureau of Statistics data from 1999 survey extrapolated to 2006 [14]; TTH: tension-type headache; MOH: probable medication-overuse headache; 
n/a: not available. 
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Discussion 

A recent review noted that scientific misconduct is on the 
rise [15]. Whether or not this is true (rather than increasing 
awareness of it - or greater willingness to recognize it), fab- 
rication or modification of research data is clearly common 
[2] and can have far-reaching consequences. It is obvious 
that decisions based on falsified data regarding treatments, 
health-care priorities, health policy and health-resource al- 
location may be seriously misguided [3,8,9]. Future research 
unknowingly built upon fabricated data may be disastrously 
misled [10]. 

The usual motivation for falsifying research is monetary 
gain, either directly or, in academic circles, through career 
advancement [16]. In this case, simple laziness was an al- 
ternative explanation, but the truth was probably more 
complicated. In epidemiological research, committed in- 
vestigators may plan and organize every step of a survey 
but data collection often depends on hired interviewers 
with no personal interest in the research. Not least be- 
cause data collection is a time-consuming and commonly 
tedious process, vulnerability to fraud is high. It seems im- 
portant to recognise this. Although a certain amount of 
trust is necessary for the implementation of a study, it is 
unfortunately but clearly necessary to implement quality 
checks [17]. The quality-assurance methods utilized here 
were pioneered in an LTB-sponsored study in India [18]. 

It is salutary to note that preventative measures alone 
were not sufficient here; detective measures were needed 
also. In this instance, the fraud was unsophisticated, and 
therefore readily detected - once it had been suspected. 
Successful data fabrication requires some understanding 
of what the data should look like, which the miscreant in- 
terviewers lacked. They were not, it seems also, practised 
fraudsters: they did not apparently employ the common 
technique (in fraud) of properly recording data from an 
initial relatively small sample and then reproducing these 
data repeatedly with minor changes - which produces a 
large dataset with a degree of verisimilitude (unless, by 
chance, the initial sample happened to be atypical). Never- 
theless, without quality assurance, the Multan data might 
simply have been pooled with those from the other loca- 
tions, and the discrepancies, though still misleadingly in- 
fluential upon the survey as a whole, would not then have 
been obvious. 

Quality assurance measures add to study costs, and na- 
tional surveys are not done cheaply: human resource and 
travel costs are high. But the greater cost to us - both fi- 
nancially and in lost time - was in having to discard data 
from over 800 participants and repair the survey by re- 
peating a large part of it [17]. 

We learnt some lessons. We would have done better at 
the outset to introduce legally-binding contracts rather 
than informal understandings, although this might not be 
true, or feasible, in all cultures. Interviewers should have 



been paid on successful delivery and after initial analysis 
of data, rather than on a monthly basis. Field visits prob- 
ably would better have been conducted earlier during the 
data collection phase, although, since the problems arose 
with rural data collection, and most interviewers com- 
pleted urban data collection first, this might have been 
falsely reassuring. 

Conclusion 

Fraud in community-based surveys is seldom reported, 
but it occurs and it should not be assumed to do so less 
frequently than in other fields of research. This incident 
and its aftermath are reported to highlight the need for an- 
ticipation, prevention, detection and, when it is discovered, 
correction of fraud in future community-based interviewer- 
dependent surveys. 
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